A Non Reproducible workflow - ‘https://www.youtube.com/watch?v=s3JldKoA0zw\&t=5s’
That’s why we are playing around with RMarkdown today.
Clearly, there’s no best way of doing so than throwing in a bunch of Cats.
RMarkdown is the easiest way to create interactive documents integrating text, code and output from your code. It is fairly versatile, has a shallow learning curve and, as it often happens in R, there is a bunch of people continuously expanding its functionalities and possibilities.
For instance, you can:
Yes, you guessed that. This very document has been generated using RMarkdown.
Not all that glitters is gold, though. Expect headaches when setting up your (especially Windows) machine to work if interested in knitting pdfs or more advanced stuff.
Source: https://bookdown.org/yihui/rmarkdown/
The document format “R Markdown” was first introduced in the knitr package (Xie 2015, 2020c) in early 2012. The idea was to embed code chunks (of R or other languages) in Markdown documents.
However, the original version of Markdown invented by John Gruber was often found overly simple and not suitable to write highly technical documents. For example, there was no syntax for tables, footnotes, math expressions, or citations. Fortunately, John MacFarlane created a wonderful package named Pandoc (http://pandoc.org) to convert Markdown documents (and many other types of documents) to a large variety of output formats. More importantly, the Markdown syntax was significantly enriched. Now we can write more types of elements with Markdown while still enjoying its simplicity.
In a nutshell, R Markdown stands on the shoulders of knitr and Pandoc. The former executes the computer code embedded in Markdown, and converts R Markdown to Markdown. The latter renders Markdown to the output format you want (such as PDF, HTML, Word, and so on).
The rmarkdown package (Allaire, Xie, McPherson, et al. 2020) was first created in early 2014.
Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. Created by John Gruber in 2004, Markdown is now one of the world’s most popular markup languages.
It’s the language used to create README’s and Project descriptions in GitHub
Latex which is pronounced «Lah-tech» or «Lay-tech» (to rhyme with «blech» or «Bertolt Brecht»), is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents but it can be used for almost any form of publishing.
LaTeX is not a word processor! Instead, LaTeX encourages authors not to worry too much about the appearance of their documents but to concentrate on getting the right content.
You need LaTex only for knitting to pdf documents. If you’re happy with html notebooks, there’s no need of installing it.
If you need to convert files from one markup format into another, pandoc is your swiss-army knife.
The good news is that your RStudio IDE has already a Pandoc installation embedded! Maybe it’s not the latest, but it will work just fine most of the times.
Almost. Fire up your RStudio and install RMarkdown first.
install.packages("RMardown")
Time to start your first RMarkdown document.
File -> New File -> R Markdown
We’ll stick to html today. Knitting documents to pdf is undoubtedly cooler, but requires installing Latex, which might be tricky, depending on the machine one is using.
Congratulations - You’ve just created your first RMarkdown document.
You should be seeing something like this: The first part is called the metadata.
The metadata is written between the pair of three dashes — The syntax for the metadata is YAML (YAML Ain’t Markup Language, https://en.wikipedia.org/wiki/YAML), so sometimes it is also called the YAML metadata or the YAML frontmatter. Before it bites you hard, we want to warn you in advance that indentation matters in YAML, so do not forget to indent the sub-fields of a top field properly.
(Source - https://bookdown.org/yihui/rmarkdown/basics.html).
It sounds intimidating. However, you won’t have to do much with your metadata most of the time, besides copy pasting it from a template. Pheeeewww!
In the rest of the script you can distinguish between text and chunks of code. Note how chunks of code are enclosed by three backtick signs.
```
If you can’t find the backtick sign in your keyboard, try with the ASCI code: Alt+96.
Look at the two buttons highlighted by the arrows.
knit my notebook?Knitting your RMarkdown script means rendering it to your chosen output (html in this case). There is quite a lot of machinery (=dark magic) happening behind the scenes. Fortunately, for most applications we don’t have to understand how these happen. Just enjoy the result.
Workflow - Source: ‘https://bookdown.org/yihui/rmarkdown-cookbook’
RMarkdown document?Let’s start with the easiest. How to add text (=narrative), to my RMarkdown. Easy-peasy. Just type!
You just need to know a couple of things:
Going to new line -> Need to add two whitespaces at the end of a block of text.
Leaving an empty line -> Two ways: 1. <br>, 2.\newline (followed by an empty line).
Titles and Headers:
# Header - Header 1
## Header - Header 2
### Header - Header 3
Basic formatting. You can use some basic markdown formatting to make your text:
Italic: *Felis catus* –> Felis catus
Italic: _Felis catus_ –> Felis catus
Bold: **Felis catus** –> Felis catus
Bold: __Felis catus__ –> Felis catus
Both: ***Felis catus*** –> Felis catus
Both: ___Felis catus___ –> Felis catus
Adding Lynx:
[Eurasian Lynx - Wikipedia](https://en.wikipedia.org/wiki/Eurasian_lynx)
Renders as:
Eurasian Lynx - Wikipedia
Embed cat from url/file:
<center> {height=300px} </center>
Renders as:
Create numbered lists of cats (don’t forget to leave an empty line before starting the list). E.g.,
My favourite wildcats:
1. Andean Cat (*Leopardus Jacobita*)
2. Rusty Spotted Cat (*Prionailurus Rubiginosus*)
3. Chinese Mountain Cat (*Felis Bieti*)
4. Kodkod (*Leopardus Guigna*)
Renders as:
Create unordered lists of cats:
Where small cats live:
* Small Cats of South America+ Andean Cat+ Geoffroy’s Cat+ Jaguarundi+ ...* Small Cats of SE Asia+ Leopard Cat+ Marbled Cat+ Fishing Cat+ ...Renders as:
Do Yourself a Favour - and download the catsheet (=cheatsheet)
Catsheet - https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
To insert a chunk of code, just enclose it between three backticks, followed by {r}:
```{r}
4+4
```
It will render as:
4+4
## [1] 8
Let’s make some practice. We need to import some cat-related data, first.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'tibble' was built under R version 4.0.3
## Warning: package 'tidyr' was built under R version 4.0.3
## Warning: package 'readr' was built under R version 4.0.3
## Warning: package 'dplyr' was built under R version 4.0.3
## Warning: package 'forcats' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
big.cats <- read.table("data/Wikipedia_LargestCats.txt", header = T, sep="\t")
big.cats
## Rank Common.name Scientific.name Weight.range.kg
## 1 1 Tiger Panthera tigris 90-300
## 2 2 Lion Panthera leo 160-270
## 3 3 Jaguar Panthera onca 56-120
## 4 4 Cougar Puma concolor 53-100
## 5 5 Leopard Panthera pardus 17-90
## 6 6 Cheetah Acinonyx jubatus 20-60
## 7 7 Snow leopard Panthera uncia 22-55
## 8 8 Eurasian lynx Lynx lynx 15-45
## 9 9 Sunda clouded leopard Neofelis diardi 12-26
## 10 10 Clouded leopard Neofelis nebulosa 11.5-23
## Maximum.weight.kg Maximum.length.m Native.range.by.continent
## 1 388.78 4.17 Asia
## 2 375.00 3.64 Asia, Africa, Europe
## 3 160.00 2.60 North and South America
## 4 125.20 2.80 North and South America
## 5 96.50 2.75 Asia, Africa, Europe
## 6 72.00 2.10 Africa, Asia, Europe
## 7 75.00 2.50 Asia
## 8 38.00 1.50 Asia, Europe
## 9 27.00 1.30 Asia
## 10 23.00 1.08 Asia
This data.frame contains the weight range, and the maximum observed weights and lengths of the ten largest wildcats. (Source: Wikipedia).
When knitting this code, we are getting a bunch of messages related to the tidyverse package. Not so nice in a report. You can deactivated them by opening your chunk with {r, warning=F, message=F}.
Let’s do some real R code. In the chunk below, we split the weight range field into min and max using some fancy dplyr code.
big.cats <- big.cats %>%
separate(Weight.range.kg, into=c("Weight.min", "Weight.max"), sep = "-", remove = T) %>%
mutate(Weight.min=as.numeric(Weight.min),
Weight.max=as.numeric(Weight.max)) %>%
mutate(Common.name=factor(Common.name, levels=big.cats$Common.name))
Maybe in your report you want to automatically include a value which you calculate in your r script. This can be done with some inline code. This can be done enclosing some code by backticks and specifying the code is in r. For instnce the code:
The cat having the highest weight is `r big.cats[1,'Common.name']`. It weights up to `r big.cats[1,'Maximum.weight.kg']`.
Renders as:
The cat having the highest weight is Tiger.
It weights up to 388.78.
A couple of useful arguments when you start your chunk:
{r echo=F} — For running your code in the background, without showing the code itself.
{r eval=F} — Opposite. For showing your code, but without actually running it.
THE THING about RMarkdown is that it allows embedding graphs directly to your document. Input changed? Just re-knit and you’ll have all your graphs updated. It’s as easy as simply running a chunk of code.
Let’s make a graph to see the differences in cat size more easily.
# create spider plot
#remotes::install_github("hilaryparker/cats") #if needed
library(cats)
ggcats <- ggplot(data=big.cats) +
cats::add_cat() + ## add a random cat image on the background of the graph, if you fancy
geom_segment(aes(y=Common.name, yend=Common.name, x=Weight.min, xend=Weight.max),
arrow = arrow(length = unit(5, "points"),
ends="both", type = "closed", angle = 40)) +
ylab(NULL) +
xlab(NULL) +
theme(axis.text = element_text(size=14))
ggcats
Some useful figure related arguments here:
{r fig.height=3} - picture height in inches
{r fig.width=3} - picture width in inches
{r fig.cap="add caption here"} - Add a caption
{r fig.align="center"} - Horizontal alignment of your graph
{r dpi=150} - Change resolution of output image (mostly relevant for pdf)
{r, fig.height=3, fig.width=4, fig.cap="Weight (kg) of the 10 largest wild cats", fig.align="center", dpi=150, echo=F}Weight (kg) of the 10 largest wild cats
Note how the code isn’t visible anymore, having set echo=F. I swear it’s there, though.
If you just print an R object on the console as we did above, RMarkdown, will show it in the rendered document as well. It won’t look that good, though.
The entry level way of rendering tables is the knitr::kable function.
knitr::kable(head(big.cats[1:4,1:5]), caption="The largest cats!")
| Rank | Common.name | Scientific.name | Weight.min | Weight.max |
|---|---|---|---|---|
| 1 | Tiger | Panthera tigris | 90 | 300 |
| 2 | Lion | Panthera leo | 160 | 270 |
| 3 | Jaguar | Panthera onca | 56 | 120 |
| 4 | Cougar | Puma concolor | 53 | 100 |
An even nicer way is using the kableExtra package. For instance, when rendering to html, kableExtra allows the creation of responsive tables. This is extremely useful for tables larger than a A4 format.
library(kableExtra)
knitr::kable(big.cats, caption="The largest cats!") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
position = "center")
| Rank | Common.name | Scientific.name | Weight.min | Weight.max | Maximum.weight.kg | Maximum.length.m | Native.range.by.continent |
|---|---|---|---|---|---|---|---|
| 1 | Tiger | Panthera tigris | 90.0 | 300 | 388.78 | 4.17 | Asia |
| 2 | Lion | Panthera leo | 160.0 | 270 | 375.00 | 3.64 | Asia, Africa, Europe |
| 3 | Jaguar | Panthera onca | 56.0 | 120 | 160.00 | 2.60 | North and South America |
| 4 | Cougar | Puma concolor | 53.0 | 100 | 125.20 | 2.80 | North and South America |
| 5 | Leopard | Panthera pardus | 17.0 | 90 | 96.50 | 2.75 | Asia, Africa, Europe |
| 6 | Cheetah | Acinonyx jubatus | 20.0 | 60 | 72.00 | 2.10 | Africa, Asia, Europe |
| 7 | Snow leopard | Panthera uncia | 22.0 | 55 | 75.00 | 2.50 | Asia |
| 8 | Eurasian lynx | Lynx lynx | 15.0 | 45 | 38.00 | 1.50 | Asia, Europe |
| 9 | Sunda clouded leopard | Neofelis diardi | 12.0 | 26 | 27.00 | 1.30 | Asia |
| 10 | Clouded leopard | Neofelis nebulosa | 11.5 | 23 | 23.00 | 1.08 | Asia |
I won’t look good that good on a pdf, but on an html report it’s just as good as it can possibly be.
Whenever you knit your RMarkdown report, R will re-run all the code contained in your .Rmd script. It’s therefore not the best idea to include a chunk of code taking 3 hours to run. If you do so, even just correcting a typo in the text will require you to wait three hours before the corrected version of your report is rendered!
There are a couple of workarounds, though.
add the argument {r cache=T} to your chunk:
Adding this argument to the slowest chunks of code will save the intermediate results of these chunks in a dedicated folder. RMarkdown will only rerun these chunks when changed. Otherwise, it will skip the chunk, and get directly the cached results.
{r eval=F} + save and load
Sometimes, it might be more convenient to run slow chunks of code in a dedicated, interactive session (maybe even on a different machine), save the results in a .RData file, and reload this saved data in your report. For transparency, you might still show the code you used to produce these intermediate outputs, but setting eval=F, you’ll tell RMarkdown not to run this code.
Something like this.
Chunk 1 with slow code I ran elsewhere:
```{r eval=F}
output <- function(input) # A real SLOW chunk of code
save(output, filename='output.slow.RData')
```
Chunk 2 reimporting the output of chunk 1:
```{r}
load(filename='output.slow.RData')
```
Clicking on the knit button is convenient. The keyboard shortcut Ctrl``Alt``k is even more.
Sometimes, you might want to knit multiple documents (Yes, you might loop across several .Rmd scripts, and knit hundreds of reports in parallel). To do so, you might want to knit an .Rmd file from console as:
knitr::knit("KateRMarkdown.Rmd")
I know. You you can’t wait to produce your fantastic pdf reports, and write your next paper directly in R.
Good news - is possible - There are many templates out there, and you just have to fetch them. See for instance: https://t.co/uJBqWER5h6?amp=1
Yes, there are templates which will also format bibliographic reference as requested by different journals
Bad news - you’ll need to setup your machine first, and it might be tricky sometimes. You find some guidance at: https://bookdown.org/yihui/rmarkdown-cookbook/install-latex.html.
Many good resources out there. I only cite two:
Now it’s up to you to create a beautiful RMarkdown report full of cats. The more \ the cuter the cats, the better.
Pick up a cute, wild species of cat and:
RMarkdown projectrgbif package (help code below)Help code to download data from gbif
library(tidyverse)
library(rgbif)
myspecies <- "Caracal caracal" ## example
get.speciesKey <- function(x){name_backbone(x)$speciesKey} #get GBIF species key
key <- get.speciesKey(myspecies)
# extract the first n occurrences from rgbif
get.occurrences <- function(x, n=100){occ_search(taxonKey=x, return="data",
limit=n, hasCoordinate = T)}
# clean data
dat <- lapply(key, get.occurrences, n=100)[[1]]
dat <- dat$data %>%
dplyr::select(species, year:day, country, stateProvince,
decimalLongitude, decimalLatitude)#, everything())
sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kableExtra_1.3.4 cats_0.1 forcats_0.5.1 stringr_1.4.0
## [5] dplyr_1.0.4 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
## [9] tibble_3.0.6 ggplot2_3.3.3 tidyverse_1.3.0 hexSticker_0.4.9
## [13] cowplot_1.1.1
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 sass_0.3.1 viridisLite_0.3.0
## [4] ggimage_0.2.8 jsonlite_1.7.2 showtext_0.9-2
## [7] modelr_0.1.8 bslib_0.2.4 assertthat_0.2.1
## [10] BiocManager_1.30.10 rvcheck_0.1.8 highr_0.8
## [13] showtextdb_3.0 cellranger_1.1.0 yaml_2.2.1
## [16] pillar_1.5.0 backports_1.2.1 lattice_0.20-41
## [19] glue_1.4.2 digest_0.6.27 rvest_0.3.6
## [22] colorspace_2.0-0 htmltools_0.5.1.1 pkgconfig_2.0.3
## [25] broom_0.7.5 haven_2.3.1 magick_2.6.0
## [28] sysfonts_0.8.3 webshot_0.5.2 scales_1.1.1
## [31] svglite_2.0.0 ggplotify_0.0.5 jpeg_0.1-8.1
## [34] generics_0.1.0 farver_2.0.3 ellipsis_0.3.1
## [37] withr_2.4.1 hexbin_1.28.2 cli_2.3.0
## [40] magrittr_2.0.1 crayon_1.4.1 readxl_1.3.1
## [43] evaluate_0.14 fs_1.5.0 fansi_0.4.2
## [46] xml2_1.3.2 tools_4.0.1 hms_1.0.0
## [49] lifecycle_1.0.0 munsell_0.5.0 reprex_1.0.0
## [52] compiler_4.0.1 jquerylib_0.1.3 systemfonts_1.0.1
## [55] gridGraphics_0.5-1 rlang_0.4.10 grid_4.0.1
## [58] rstudioapi_0.13 labeling_0.4.2 rmarkdown_2.7
## [61] gtable_0.3.0 DBI_1.1.1 curl_4.3
## [64] R6_2.5.0 lubridate_1.7.9.2 knitr_1.31
## [67] utf8_1.1.4 stringi_1.5.3 Rcpp_1.0.6
## [70] vctrs_0.3.6 dbplyr_2.1.0 tidyselect_1.1.0
## [73] xfun_0.21